Random Indexing using Statistical Weight Functions
نویسندگان
چکیده
Random Indexing is a vector space technique that provides an efficient and scalable approximation to distributional similarity problems. We present experiments showing Random Indexing to be poor at handling large volumes of data and evaluate the use of weighting functions for improving the performance of Random Indexing. We find that Random Index is robust for small data sets, but performance degrades because of the influence of high frequency attributes in large data sets. The use of appropriate weight functions improves this significantly.
منابع مشابه
Random Indexing Re-Hashed
This paper introduces a modified version of Random Indexing, a technique for dimensionality reduction based on random projections. We here describe how RI can be efficiently implemented using the notion of universal hashing. This eliminates the need to store any random vectors, replacing them instead with a small number of hash-functions, thereby dramatically reducing the memory footprint. We d...
متن کاملEstimation of Variance Components for Body Weight of Moghani Sheep Using B-Spline Random Regression Models
The aim of the present study was the estimation of (co) variance components and genetic parameters for body weight of Moghani sheep, using random regression models based on B-Splines functions. The data set included 9165 body weight records from 60 to 360 days of age from 2811 Moghani sheep, collected between 1994 to 2013 from Jafar-Abad Animal Research and Breeding Institute, Ardabil province,...
متن کاملLanguage Recognition using Random Indexing
Random Indexing is a simple implementation of Random Projections with a wide range of applications. It can solve a variety of problems with good accuracy without introducing much complexity. Here we demonstrate its use for identifying the language of text samples, based on a novel method of encoding letter n-grams into high-dimensional Language Vectors. Further, we show that the method is easil...
متن کاملReflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections
The discovery of implicit connections between terms that do not occur together in any scientific document underlies the model of literature-based knowledge discovery first proposed by Swanson. Corpus-derived statistical models of semantic distance such as Latent Semantic Analysis (LSA) have been evaluated previously as methods for the discovery of such implicit connections. However, LSA in part...
متن کاملتوابع احتمالی حاکم بر نیروها و لنگرهای ناشی از امواج تصادفی دریا بر پایه قائم
Using the statistical characteristics is one of the methods to justify the random nature of the ocean waves. Probability function are used to facilitate the studies of the random waves parameters, such as the surface and height and period of the waves. Since, the force of the ocean waves are the prevalent principal forces on the offshore structures, the assignment of the significant structural ...
متن کامل